{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CPSC 330 hw3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "plt.rcParams['font.size'] = 16\n",
    "\n",
    "from sklearn.dummy import DummyClassifier\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.model_selection import train_test_split, cross_val_score, cross_validate, GridSearchCV\n",
    "from sklearn.feature_extraction.text import CountVectorizer\n",
    "from sklearn.pipeline import Pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instructions\n",
    "rubric={points:5}\n",
    "\n",
    "Follow the [homework submission instructions](https://github.com/UBC-CS/cpsc330/blob/master/docs/homework_instructions.md). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 1: implementing `DummyClassifier`\n",
    "rubric={points:20}\n",
    "\n",
    "In this course (unlike CPSC 340) you will generally **not** be asked to implement machine learning algorihtms (like logistic regression) from scratch. However, this exercise is an exception: you will implement the simplest possible classifier, `DummyClassifier`.\n",
    "\n",
    "As a reminder, `DummyClassifier` is meant as a baseline and is generally the worst possible \"model\" you could \"fit\" to a dataset. All it does is predict the most popular class in the training set. So if there are more 0s than 1s it predicts 0 every time, and if there are more 1s than 0s it predicts 1 every time. For `predict_proba` it looks at the frequencies in the training set, so if you have 30% 0's 70% 1's it predicts `[0.3 0.7]` every time. Thus, `fit` only looks at `y` (not `X`).\n",
    "\n",
    "Below you will find starter code for a class called `MyDummyClassifier`, which has methods `fit()`, `predict()`, `predict_proba()` and `score()`. Your task is to fill in those four functions. To get your started, I have given you a `return` statement in each case that returns the correct data type: `fit` can return nothing, `predict` returns an array whose size is the number of examples, `predict_proba` returns an array whose size is the number of examples x 2, and `score` returns a number.\n",
    "\n",
    "The next code block has some tests you can use to assess whether your code is working. \n",
    "\n",
    "I suggest starting with `fit` and `predict`, and making sure those are working before moving on to `predict_proba`. For `predict_proba`, you should return the frequency of each class in the training data, which is the behaviour of `DummyClassifier(strategy='prior')`. Your `score` function should call your `predict` function. Again, you can compare with `DummyClassifier` using the code below.\n",
    "\n",
    "To simplify this question, you can assume **binary classification**, and furthermore that these classes are **encoded as 0 and 1**. In other words, you can assume that `y` contains only 0s and 1s. The real `DummyClassifier` works when you have more than two classes, and also works if the target values are encoded differently, for example as \"cat\", \"dog\", \"mouse\", etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MyDummyClassifier:\n",
    "    \"\"\"\n",
    "    A baseline classifier that predicts the most common class.\n",
    "    The predicted probabilities come from the relative frequencies\n",
    "    of the classes in the training data.\n",
    "    \n",
    "    This implementation only works when y only contains 0s and 1s.\n",
    "    \"\"\"\n",
    "    def fit(self, X, y):\n",
    "                \n",
    "        return None # Replace with your code \n",
    "    \n",
    "        \n",
    "    def predict(self, X):   \n",
    "        \n",
    "        return np.zeros(X.shape[0]) # Replace with your code \n",
    "\n",
    "    \n",
    "    def predict_proba(self, X):\n",
    "        \n",
    "        return np.zeros((X.shape[0],2)) # Replace with your code \n",
    "    \n",
    "    \n",
    "    def score(self, X, y):\n",
    "        \n",
    "        return 0.0 # Replace with your code \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below are some tests for `predict` using randomly generated data. You may want to run the cell a few times to make sure you explore the different cases (or automate this with a loop or random seeds)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For testing, generate random data\n",
    "n_train = 101\n",
    "n_valid = 21\n",
    "d = 5\n",
    "X_train_dummy = np.random.randn(n_train, d)\n",
    "X_valid_dummy = np.random.randn(n_valid, d)\n",
    "y_train_dummy = np.random.randint(2, size=n_train)\n",
    "y_valid_dummy = np.random.randint(2, size=n_valid)\n",
    "\n",
    "my_dc = MyDummyClassifier()\n",
    "sk_dc = DummyClassifier(strategy=\"prior\")\n",
    "\n",
    "my_dc.fit(X_train_dummy, y_train_dummy);\n",
    "sk_dc.fit(X_train_dummy, y_train_dummy);\n",
    "\n",
    "assert np.array_equal(my_dc.predict(X_train_dummy), sk_dc.predict(X_train_dummy))\n",
    "assert np.array_equal(my_dc.predict(X_valid_dummy), sk_dc.predict(X_valid_dummy))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below are some tests for `predict_proba`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert np.allclose(my_dc.predict_proba(X_train_dummy), sk_dc.predict_proba(X_train_dummy))\n",
    "assert np.allclose(my_dc.predict_proba(X_valid_dummy), sk_dc.predict_proba(X_valid_dummy))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below are some tests for `score`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert np.isclose(my_dc.score(X_train_dummy, y_train_dummy), sk_dc.score(X_train_dummy, y_train_dummy))\n",
    "assert np.isclose(my_dc.score(X_valid_dummy, y_valid_dummy), sk_dc.score(X_valid_dummy, y_valid_dummy))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-8e3cc53df86a7e14",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "toc-hr-collapsed": true
   },
   "source": [
    "## Exercise 2: Trump Tweets\n",
    "\n",
    "For the rest of this assignment we'll be looking at a [dataset of Donald Trump's tweets](https://www.kaggle.com/austinreese/trump-tweets) as of June 2020. You should start by downloading the dataset. Unzip it and move the file `realdonaldtrump.csv` into this directory. As usual, please do not commit it to your repos. This assignment hopefully shipped with a `.gitignore` file that will prevent you from committing the dataset to your repo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tweets_df = pd.read_csv('realdonaldtrump.csv', index_col=0)\n",
    "tweets_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tweets_df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will be trying to predict whether a tweet will go \"viral\", defined as having more than 10,000 retweets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = tweets_df[\"retweets\"] > 10_000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make predictions, we'll be using only the content (text) of the tweet. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = tweets_df[\"content\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the purpose of this assignment, you can ignore all the other columns in the original dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2(a) ordering the steps\n",
    "rubric={points:6}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's start by building a model using `CountVectorizer` and `LogisticRegression`. The code required to do this has been provided below, but in the wrong order. \n",
    "\n",
    "- Rearrange the lines of code to correctly fit the model and compute the cross-validation score. \n",
    "- Add a short comment to each block to describe what the code is doing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [],
   "source": [
    "# YOUR COMMENT HERE\n",
    "countvec = CountVectorizer(stop_words='english')\n",
    "\n",
    "# YOUR COMMENT HERE\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=321)\n",
    "\n",
    "# YOUR COMMENT HERE\n",
    "cross_val_results = pd.DataFrame(cross_validate(pipe, X_train, y_train, return_train_score=True))\n",
    "\n",
    "# YOUR COMMENT HERE\n",
    "pipe = Pipeline([('countvec', countvec), ('logreg', lr)])\n",
    "\n",
    "# YOUR COMMENT HERE\n",
    "cross_val_results.mean()\n",
    "\n",
    "# YOUR COMMENT HERE\n",
    "lr = LogisticRegression(max_iter=1000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2(b) Cross-validation fold sub-scores\n",
    "rubric={points:3}\n",
    "\n",
    "Above we averaged the scores from the 5 folds of cross-validation. \n",
    "\n",
    "- Print out the 5 individual scores. Reminder: sklearn calls them `\"test_score\"` but they are really (cross-)validation scores. \n",
    "- Are the 5 scores close to each other or spread far apart? (This is a bit subjective, answer to the best of your ability.)\n",
    "- How does the size of this dataset (number of rows) compare to the cilantro dataset? How does this relate to the different sub-scores from the 5 folds?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2(c) baseline\n",
    "rubric={points:3}\n",
    "\n",
    "By the way, are these scores any good? \n",
    "\n",
    "- Run `DummyClassifier` (or `MyDummyClassifier`!) on this dataset.\n",
    "- Compare the `DummyClassifier` score to what you got from logistic regression above. Does logistic regression seem to be doing anything useful?\n",
    "- Is it necessary to use `CountVectorizer` here? Briefly explain."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-ba1f8ea22638cf75",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "#### 2(d) probability scores\n",
    "rubric={points:3}\n",
    "\n",
    "Here we train a logistic regression classifier on the entire training set: \n",
    "\n",
    "(Note: this is relying on the `pipe` variable from 2(a) - you'll need to redefine it if you overwrote that variable in between.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipe.fit(X_train, y_train);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using this model, find the tweet in the **test set** with the highest predicted probability of being viral. Print out the tweet and the associated probability score.\n",
    "\n",
    "Reminder: you are free to reuse/adapt code from lecture. Please add in a small attribution, e.g. \"From Lecture 4\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-f910e9d1d6d09182",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "#### 2(e) coefficients\n",
    "rubric={points:3}\n",
    "\n",
    "We can extract the `CountVectorizer` and `LogisticRegression` objects from the `Pipeline` object as follows:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vec_from_pipe = pipe.named_steps['countvec']\n",
    "lr_from_pipe  = pipe.named_steps['logreg']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using these extracted components above, display the 5 words with the highest coefficients and the 5 words with the smallest coefficients."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2(f)\n",
    "rubric={points:10}\n",
    "\n",
    "scikit-learn provides a lot of useful tools like `Pipeline`s and `cross_validate`, which are awesome. But with these fancy tools it's also easy to lose track of what is actually happening under the hood. Here, your task is to \"manually\" (without `Pipeline` and without `cross_validate` or `cross_val_score`) compute logistic regression's validation score on one fold (that is, train on 80% and validate on 20%) of the training data. \n",
    "\n",
    "You should start with the following `CountVectorizer` and `LogisticRegression` objects, as well as `X_train` and `y_train` (which you should further split):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "countvec = CountVectorizer(stop_words='english')\n",
    "lr = LogisticRegression(max_iter=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Meta-comment: you might be wondering why we're going into \"implementation\" here if this course is about _applied_ ML. In CPSC 340, we would go all the way down into `LogisticRegression` and understand how `fit` works, line by line. Here we're not going into that at all, but I still think this type of question (and Exercise 1) is a useful middle ground. I do want you to know what is going on in `Pipeline` and in `cross_validate` even if we don't cover the details of `fit`. To get into logistic regression's `fit` requires a bunch of math; here, we're keeping it more conceptual and avoiding all those prerequisites."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc-hr-collapsed": true
   },
   "source": [
    "## Exercise 3: hyperparameter optimization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-5e9e6fdea209d872",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "#### 3(a)\n",
    "rubric={points:2}\n",
    "\n",
    "The following code varies the `max_features` hyperparameter of `CountVectorizer` and makes a plot (with the x-axis on a log scale) that shows train/cross-validation scores vs. `max_features`. It also prints the results. Based on the plot/output, what value of `max_features` seems best? Briefly explain.\n",
    "\n",
    "Note: the code may take a minute or two to run. You can uncomment the `print` statement if you want to see it show the progress."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_scores = []\n",
    "cv_scores = []\n",
    "\n",
    "max_features = [10,100,1000,10_000,100_000]\n",
    "\n",
    "for mf in max_features:\n",
    "#     print(mf)\n",
    "    pipe = Pipeline([('countvec', CountVectorizer(stop_words='english', max_features=mf)), \n",
    "                         ('logreg', LogisticRegression(max_iter=1000))])\n",
    "    cv_results = cross_validate(pipe, X_train, y_train,  return_train_score=True)\n",
    "\n",
    "    train_scores.append(cv_results[\"train_score\"].mean())\n",
    "    cv_scores.append(cv_results[\"test_score\"].mean())\n",
    "\n",
    "plt.semilogx(max_features, train_scores, label=\"train\")\n",
    "plt.semilogx(max_features, cv_scores, label=\"valid\")\n",
    "plt.legend();\n",
    "plt.xlabel(\"max_features\");\n",
    "plt.ylabel(\"accuracy\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame({\"max_features\" : max_features, \"train\" : train_scores, \"cv\" : cv_scores})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3(b)\n",
    "rubric={points:2}\n",
    "\n",
    "The following code varies the `C` hyperparameter of `LogisticRegression` and makes a plot (with the x-axis on a log scale) that shows train/cross-validation scores vs. `C`. Based on the plot, what value of `C` seems best?\n",
    "\n",
    "Note: the code may take a minute or two to run. You can uncomment the `print` statement if you want to see it show the progress."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_scores = []\n",
    "cv_scores = []\n",
    "\n",
    "C_vals = 10.0**np.arange(-1.5,2,0.5)\n",
    "\n",
    "for C in C_vals:\n",
    "#     print(C)\n",
    "    pipe = Pipeline([('countvec', CountVectorizer(stop_words='english', max_features=None)), \n",
    "                         ('logreg', LogisticRegression(max_iter=1000, C=C))])\n",
    "    cv_results = cross_validate(pipe, X_train, y_train,  return_train_score=True)\n",
    "\n",
    "    train_scores.append(cv_results[\"train_score\"].mean())\n",
    "    cv_scores.append(cv_results[\"test_score\"].mean())\n",
    "    \n",
    "plt.semilogx(C_vals, train_scores, label=\"train\")\n",
    "plt.semilogx(C_vals, cv_scores, label=\"valid\")\n",
    "plt.legend();\n",
    "plt.xlabel(\"C\");\n",
    "plt.ylabel(\"accuracy\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame({\"C\" : C_vals, \"train\" : train_scores, \"cv\" : cv_scores})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3(c)\n",
    "rubric={points:10}\n",
    "\n",
    "- Using `GridSearchCV`, jointly optimize `max_features` and `C` across all the combinations of values we tried above. \n",
    "  - Note: the code might be a bit slow here. \n",
    "  - Setting `n_jobs=-1` should speed it up if you have a multi-core processor.\n",
    "  - You can reduce the number of folds (e.g. `cv=2`) to speed it up if necessary.\n",
    "- What are the best values of `max_features` and `C` according to your grid search?\n",
    "- Do these best values agree with what you found in parts (a) and (b)?\n",
    "- Generally speaking, _should_ these values agree with what you found in parts (a) and (b)? Explain."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3(d)\n",
    "rubric={points:3}\n",
    "\n",
    "- Evaluate your final model on the test set. \n",
    "- How does your test accuracy compare to your validation accuracy? \n",
    "- If they are different: do you think this is because you \"overfitted on the validation set\", or simply random luck?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 4: Very short answer questions\n",
    "rubric={points:10}\n",
    "\n",
    "Each question is worth 2 points. Max 2 sentences per answer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. What is the problem with calling `fit_transform` on your test data with `CountVectorizer`? \n",
    "2. Why is it important to follow the Golden Rule? If you violate it, will that give you a worse classifier?\n",
    "3. If you could only access one of `predict` or `predict_proba`, which one would you choose? Briefly explain.\n",
    "4. What are two advantages of using sklearn `Pipeline`s? \n",
    "5. What are two advantages of `RandomizedSearchCV` over `GridSearchCV`?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Submission to Canvas\n",
    "\n",
    "When you are ready to submit your assignment do the following:\n",
    "\n",
    "\n",
    "1. **NEW STEP**: if you run `pip install canvasutils -U` in your terminal with your environment activated, you should update to a newer version of `canvasutils`. This new version should address a couple pain points:\n",
    "  - The dropdown menu should now be sorted from most to least recent, meaning the default will be hw3.\n",
    "  - For those having trouble with the Jupyter widgets and the dropdowns: if you add the argument `no_widgets=True` to your `submit` call, it should let you do a text-based entry of your key and avoid the dropdowns altogether.\n",
    "1. Run all cells in your notebook to make sure there are no errors by doing `Kernel -> Restart Kernel and Clear All Outputs` and then `Run -> Run All Cells`.\n",
    "2. Save your notebook.\n",
    "3. Convert your notebook to `.html` format using the `convert_notebook()` function below **or** by `File -> Export Notebook As... -> Export Notebook to HTML`.\n",
    "4. Run the code `submit()` below to go through an interactive submission process to Canvas.\n",
    ">For this step, you will need a Canvas *Access Token* token. If you haven't already got one, log-in to Canvas, click `Account` (top-left of the screen), then `Settings`, then scroll down until you see the `+ New Access Token` button. Click that button, give your token any name you like and set the expiry date to Dec 31, 2020. Then click `Generate token`. Save this token in a safe place on your computer as you'll need it for all assignments. Treat the token with as much care as you would an important password. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from canvasutils.submit import submit, convert_notebook\n",
    "\n",
    "# Note: the canvasutils package should have been installed as part of your environment setup - \n",
    "# see https://github.com/UBC-CS/cpsc330/blob/master/docs/setup.md"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# convert_notebook(\"hw3.ipynb\", \"html\")  # uncomment and run when you want to try convert your notebook to HTML (or you can convert manually from the File menu)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# submit(course_code=53561, token=False)  # uncomment and run when ready to submit "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Create Assignment",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
